Improving What Is Published

ثبت نشده
چکیده

Researchers and practitioners of psychology have expressed considerable dissatisfaction with much of what is published in professional journals. Three major areas of discontent are as follows: (a) Many articles focus on irrelevant topics; (b) the use of statistical significance testing often results in meaningless or unusable findings; and (c) the decision-making process for manuscript acceptance/rejection may be biased. Each of these issues is discussed, and an alternative model for manuscript submission is proposed. The advantages and limitations of this model are presented as related to the three areas of current dissatisfaction. Much dissatisfaction has been expressed regarding the quality of published articles and the decision-making process in manuscript acceptance and rejection. The purpose of this article is to review three major areas of discontent and to propose a model of manuscript submission and review that may greatly mitigate the current malaise.l The three major concerns regarding publication practice to be discussed are relevancy, meaningfulness, and bias. T h e I s s u e o f Re l evancy Lindsey (1977) asked, "How is it that so much triviality, illiteracy, and dullness is yearly entered into the scientific publication stream?" (p. 579). Richard Nisbett's (1978) recommendations to psychologists provide a partial answer to Lindsey's question. Nisbett advised researchers interested in increasing the chances of publishing their empirical investigations to avoid creative or innovative experimental designs and to concentrate efforts on areas that are easy to test and are noncontroversial. Unobtrusive, circumstantial data suggest that disenchantment exists throughout the profession of psychology with what appears in journal articles. A survey conducted by Garvey and Griffith (1971) indicated that for any given study, only about 200 psychologists will read its contents within the first 60 days it appears in print. In the absence of comparable data from other scientific disciplines, it is difficult to determine if this finding indicates that psychologists' interest in reading research is representative of most scientists, or if psychologists find few studies worthy of attention. Garfield's (1972) data relate to this issue. He formulated an "impact factor" (i.e., average citation noted per published article, after correction for the number of articles a journal publishes yearly) for the 152 most frequently cited journals in science and technology. The two most often cited journals in psychology are Psychological Review and Psychological Bulletin (ranked 35th and 50th, respectively). However, the Psychological Review mainly contains theoretical articles, and Psychological Bulletin consists primarily of reviews of research (Markle & Rinn, 1977). The most frequently cited psychological journal of an experimental nature (ranked 117th), is the Journal of Experimental Analysis of Behavior (Markle & Rinn, 1977). This journal's impact factor is 2.3, suggesting that even articles from the most often cited experimental journal in psychology are often viewed as inconsequential and unlikely to be referenced in articles published in other journals. Garfield's (1972) results have led one reviewer (of this article) to conclude, "One would like to think that if scientists were content with what is currently being published, they would pay more attention to it (cite it) when they write their own papers." (Anonymous personal communication, September 11, 1987). The above data suggest that much of psychology does not advance by an accumulative progression of empirically verifiable facts, but rather, that many investigators conduct isolated studies that rarely aid colleagues in this effort to understand psychological phenomena. Results of surveys exploring practitioners' concerns with publication practices have yielded uniform results over the years. Practitioners report that psychotherapy research has little value for clinical application. When psychologists are requested to rank order the usefulness of informational sources to their practice, research articles and books of empirical research are consistently rated at the bottom on the scale (Cohen, 1979; Cohen, Sargent, & Sechrest, 1986; Morrow-Bradley & Elliot, 1986). Morrow-Bradley and Elliot's (1986) review of 18 references concludes, "With virtual unanimity, psychotherapy researchers have argued that (a) psychotherapy research should yield information useful to practicing therapists, (b) such research to date has not done so, and (c) this problem should be remedied" (p. 188). The end result is that many forms of therapy are adopted before data demCorrespondence concerning this article should be addressed to Joel Kupfersmid, 1560 Callander, Hudson, OH 44236. I The three areas of dissatisfaction are not intended to be an exhaustive list. Rather, these issues represent areas where dissatisfaction can be reduced if the proposed model is instituted. August 1988 9 American Psychologist Copyright 1988 by the American Psychological Association, Inc. 0003-066X/88/$00.75 Vol. 43, No. 8, 635-642 635 onstrating effectiveness are available (Barlow, Hayes, & Nelson, 1984). When survey respondents are asked to list the specific aspects of psychotherapy research that contribute most to their dissatisfaction, the relevancy of topics addressed, the manner in which hypotheses are tested, and the inappropriateness of statistical significance testing emerge as prime areas of discontent. No attitude survey was found that directly asked researchers and academicians their opinion concerning the relevancy of published studies in psychology. Given the above information regarding the opinion and behaviors of psychologists in general and the attitudes expressed by practitioners, it would not be surprising if experimentalists shared many of the same concerns. The Concern for Meaningfulness The paradigm for most experimental and correlational studies consists of postulating no difference (null hypothesis) between groups or variables, a hypothesis that the researcher then attempts to refute. Statistical significance testing is the method most experimenters adopt to determine whether the null hypothesis may be confidently rejected or retained. Significance testing involves selecting a level of probability (p value) to determine "how improbable an event could be under the null hypothesis" (Bakan, 1966, p. 429). Usually a probability of 5% or less (p < .05) is selected. "Thus the p value may be used to make a decision about accepting or rejecting the idea that chance caused the results. This is what statistical significance testing is--nothing more, nothing less" (Carver, 1978, p. 387). The first problem is that the no difference (null) hypothesis is never capable of being retained; that is, the null hypothesis is always false to begin with (Bakan, 1966; Greenwald, 1975; Meehl, 1978). " I f by the null hypothesis one refers to the hypothesis of exactly no difference or exactly no correlation, and so forth, then the initial probability of the null hypothesis being true must be regarded effectively as zero" (Greenwald, 1975, p. 6). Lykken (1968/1970) added that it is "foolish" even to suppose that the difference between two groups, or the correlation between two variables, is ever zero. Researchers can confidently make this claim because no two groups of subjects or variables are ever effectively equal; rather, the null hypothesis will always be rejected tfthe experimenter has a large enough sample N. Meehl (1978) concluded that "reliance on merely refuting the null h y p o t h e s i s . . , is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology" (p. 817). A second problem related to the meaningfulness of "statistically significant" findings is that what is significant statistically and what is "significant" in a meaningful sense may be contradictory. The question is never whether the result is statistically significant, but rather, at what N the data would reach statistical significance. Clinicians have expressed dissatisfaction with the use of statistical significance testing because the procedure bears no relationship to the size effect (i.e., degree of difference between two samples and/or degree of correlation; Smith, Glass, & Miller, 1980). The word "significance" is used even for trivial findings. Furthermore, statistical significance is a measure of group effects, whereas the practitioner is concerned with individuals (Barlow et at., 1984). Fifty-two percent of psychotherapists surveyed (Morrow-Bradley & Elliot, 1986) criticized the use of statistical significance testing because the information does not address the important question of how many subjects changed or to what degree. Quoting Bergin and Strupp (1972, p. 440), Barlow et al. (1984) noted, In the area of psychotherapy, the kinds of effects we need to demonstrate . . , should be significant enough so that they are readily observable by inspection or descriptive statistics. If this cannot be done, no fixation upon statistical and mathematical niceties will generate fruitful insights. (p. 28) Furthermore, in all of experimentation, it is critical to collect data that are relevant to psychological inquiry such that, regardless of outcome, via statistical manipulation, meaningful results are generated. Jacobson, Follette, and Revenstorf (1984) directly addressed this issue by eschewing traditional statistical significance tests for psychotherapy outcome research and argued in favor of a "clinical significance" test. The authors provided a summary of five such measures as well as presenting their own approach: "Therefore, we propose that a change in therapy is clinically significant when the client moves from the dysfunctional to the functional range during the course of therapy on whatever variable is being used to measure the clinical problem" (p. 340). The problem associated with generalization of findings to clients seen in a clinician's office has also received critical comment. Statistical inference is contingent on the criterion of random sampling techniques, and rarely, if ever, is this criterion satisfied in psychotherapy studies (Bakan, 1966; Barlow et at., 1984). Statistical inference is tied to sampling theory. A sample employed in research must represent, or closely approximate, individuals seen in clinical practice. Information that cannot be inferred or generalized beyond the sample of a particular study is not of any value to the practitioner. Thus, even if random samples are obtainable, the question usually remains whether the sample would be too heterogeneous or too homogeneous to generalize to specific clients (Barlow et at., 1984). Another area of concern relates to the decision theory of acceptance/rejection of the null hypothesis based on statistical significance testing. The current use of statistical significance testing limits the outcome of hypothesis testing to two choices: Retain the null (p > .05) or reject the null (p < .05). What scientist in his [sic] right mind would ever feel there to be an appreciable difference between the interpretative significance of data, say, for which one-tailed p = .04 and that of data for which p = .06, even though the point of"significance" has been set at p = .05? (Rozeboom, 1960, p. 424) 636 August 1988 9 American Psychologist Carver (1978) summarized the current dissatisfaction with hypothesis testing and the test of statistical significance as follows: If we can control statistical significance simply by changing sample size, if statistical significance is not equivalent to scientific significance, if statistical significance testing corrupts the scientific method, and if it has only questionable relevance to one out of fit~een threats to research validity, then I believe we should eliminate statistical significance testing in our research. (p. 392) If this were not enough, the use of statistical significance testing and associated p values are often misinterpreted. The most noticeable misinterpretations are (a) the p value reflects the probability that the results are due to chance, (b) p values represent the probability of obtaining the same results upon experimental replication, and (c) the p value reflects the probability that the research hypothesis is true (Bakan, 1966; Carver, 1978). Problems with misinterpretation o fp values are reflected in the results of four studies involving psychologists, journal editors and reviewers, professors, and graduate students. The paradigm of all studies involved asking participants to rate their degree of belief or confidence in the results of hypothetical studies given varying p values (from .001 through .90) with either a small sample size (10 subjects) or a larger sample size (100-200 subjects). Across all conditions, participants placed greater confidence in hypothetical studies having larger numbers of subjects even if the p values for both samples were exactly the same. Psychologists ignored the fact that the mathematics of significance testing takes into account the size of the sample. They failed to realize that small samples require a greater disparity between groups in order to reach the same p value as studies utilizing a larger number of subjects (Bakan, 1966; Carver, 1978). 2 In spite of the plethora of rational arguments against its use, the current method of hypothesis testing by statistical significance test continues. In spite of the profuse dissatisfaction with the end product of such research-irrelevant studies and meaningless results--the system flourishes. Why? There are three factors that seem to interact to maintain the current practice. First, as previously discussed, many psychologists continue to misinterpret statistical significance tests and associated p values. Too often, experimenters believe that p values express confirmation of the experimental hypothesis and that p values represent a measure of confidence in the repeatability of the experiment. Additionally, many psychologists seem unaware (or deny) the ease with which sample size affects the rejection of the null hypothesis and the establishment of statistical significance. Many researchers believe they are "discovering" something when, in fact, they are not. 2 Critics of statistical significance testing advocate a variety of statistical alternatives, including Bayes' theorem (Bakan, 1966; Greenwald, 1975); decision-theory of Neyman, Pearson, and Wold (Bakan, 1966); omega squared (Carver, 1978); eta squared (Carver, 1978); interval estimation (Greenwald, 1975); and greater use of descriptive statistics (Carver, 1978). Second, there is a bias among editors and reviewers for publishing almost exclusively studies that reject the null hypothesis via statistical significance testing. Because careers and reputations are often associated with publication, there will be no change in the nature of what is published (regardless of relevancy or meaningfulness) until the decision-making practice of journal editors is altered. In the following section of this article, the bias that editors and reviewers exhibit with respect to statistical significance testing will be addressed. Third, Kuhn's (1970) position on the nature of change in the history of scientific revolutions seems to be operative. Kuhn noted that current paradigms often persist, regardless of inadequacy, until there are alternative paradigms that can take their place. It is my intention in this article to propose an alternative to manuscript submission that, if employed, may offer a substantial improvement over the current system. The Editorial Bias Controversy The three areas of editorial bias discussed below have received considerable attention in the literature. If any of the three charges have merit, there is an additional argument that the current method of manuscript decision making be reconsidered in favor of an alternative model. The first contention of bias is that only those manuscripts that report rejection of the null hypothesis by use of statistical significance testing get published. This bias is the most dangerous of all because it would mean that the data bank of psychological knowledge is filled with Type I error (rejection of a true null hypothesis). Type I error is more serious than Type II error (rejection of a true research hypothesis) because when a Type I error appears in print it often stops researchers from studying the phenomena and/or reporting nonsignificant results (Bakan, 1966). Rosenthal (1979) termed this the "file drawer problem" in that "the journals are filled with the 5% of the studies that show Type I errors, while the file drawers back at the lab are filled with 95% of the studies that show nonsignificant (e.g., p > .05) results" (p. 638). However, do findings of nonsignificant difference actually lead researchers to "file" their studies? Do the findings of significant differences encourage researchers to submit their results? Do journals tend to publish only experiments in which statistical significance establishes rejection of the null hypothesis? The current data suggest the answers are all affirmative. Greenwald ( 1975) presented evidence from a survey in which the authors noted that there was a 50% chance that they would submit a manuscript if the null hypothesis was rejected and a 6% chance of submission if the null hypothesis was retained. Similarly, Sterling (1959/1970) reviewed the number of articles published in four psychology journals in which the null hypothesis was retained. Of the 362 articles, 8 retained the null hypothesis, and none of the studies replicated previous experiments. Sterling also selected 100 research titles at random from PsychologicalAbstracts and found that 95 articles rejected the null hypothesis, 5 failed to reject the null hypothesis, August 1988 9 American Psychologist 637 and 1 was a replication study. A more recent study conducted by Greenwald (1975) on all articles published (N = 199) by the Journal of Personality and Social Psychology revealed that only 12% of the articles retained the null hypothesis) The outcry regarding editorial bias in favor of null hypothesis testing via statistical significance testing has been heated. A few representative quotes will demonstrate the intensity of concern: The use of statistical tests of significance are not likely to decline until one or more journal editors speak against statistical significance testing . . . . (Carver, 1978, p. 397) The stranglehold that conventional null-hypothesis significance testing has clamped on publication standards must be broken. (Rozeboom, 1960, p. 428) If one could no longer use statistical significance to determine the "significance" of a difference, researchers would be forced to use designs that more clearly reveal the scientific importance of a difference. (Carver, 1978, p. 397) When passing null hypothesis tests becomes the c r i t e r i o n . . . for journal publications, there is no pressure on the psychology researcher to build a solid, accurate theory; all he or she is required to do, it seems, is produce "statistically significant" resuits. (Dar, 1987, p. 149) The moral of this story is that the finding of statistical significance is perhaps the least important attribute of a good experiment: It is never a sufficient condi t ion . . , that an experimental report ought to be published.'(Lykken, 1968/1970, p. 278) Support for the null hypothesis must be regarded as a research outcome that is as acceptable as any other. (Greenwald, 1975, p. 16) 3 Rosenthal (1979) proposed a formula for estimating the number of studies in the file drawers (or those that needed to be published in the future) that retain the null hypothesis, based on current numbers of studies in print that report statistically significant findings. Essentially, Rosenthal's formula involves (a) transforming into Z scores the p values reported in each study, (b) computing the mean Z score for these studies, and (c) multiplying this product by the number of studies gathered. Rosenthal noted, however, that only six studies in the file drawer that support the null hypothesis are necessary when there are as many as 15 studies published that report statistically significant findings. Rosenthal's formula is best suited for areas where large numbers of experiments have been conducted. Another caveat regarding this formula is that the p value is partially increased or decreased contingent on the sample N. This formula does not solve the problem of knowing that statistical significance is a likely outcome before the study is initiated, when large numbers of subjects are employed. Additionally, methodologically sound studies with smaller Ns are more likely to have larger p values than methodologically questionable studies involving hundreds of subjects. Rosenthal's formula would potentially give greater weight to the latter type of studies in spite of the fact that the former type of studies require a greater difference between means to produce statistically significant results. In an earlier work, Rosenthal (1978) provided a summary of nine other methods that may be used to combine results of independent studies. Rosenthal's formula has some merit in reducing the concern about Type I error when large numbers of studies are available in a given area. Unfortunately, this procedure does not reduce concerns about issues related to research relevancy, the meaningfulness of the experimental results, misinterpretations of the meaning of statistical significance, or editorial bias. There is little evidence that these concerns have resuited in a change of publication decision making. The American Psychological Association (APA)'s Publication Manual (1983) lists "reporting of negative results" as a major "defect" editors find in papers submitted. The second form of alleged bias suggests that manuscripts are published on the basis of the submitter 's status in the field and/or prestige of author 's institutional affiliation. Implicit in this form of bias is the belief that editors and reviewers are either incapable of discriminating among manuscripts in order to choose those that are truly exceptional in relevance and methodological vigor or there are so few exceptional studies that professional status becomes the informal, unspoken criterion for manuscript decision making. Those who charge such bias maintain that removing the title page from a manuscript before review is inadequate because (a) authors frequently refer to their previous work in the text, (b) many experimenters have a unique style of conducting research that others in the field can easily recognize, and (c) there is often a small network of researchers in a given field and manuscripts are shared among these individuals. Thus, it is highly probable that a reviewer would be familiar with a submitter 's work (Ceci

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reflections on Norheim (2018), Disease Control Priorities Third Edition Is Published; Comment on “Disease Control Priorities Third Edition Is Published: A Theory of Change Is Needed for Translating Evidence to Health Policy”

The publication of Disease Control Priorities, 3rd edition (DCP3) is a major milestone in the global health world. DCP3 reviews and summarizes high quality health intervention effectiveness and cost-effectiveness evidence relevant to low- and middle-income countries and is freely available to users...

متن کامل

Superior Capsule Reconstruction: What Do We Know?

The management of irreparable rotator cuff tears remains challenging. Since its introduction by Mihata in 2012, superiorcapsule reconstruction (SCR) has grown in popularity at an astonishingly rapid rate. The aim of this article is to providea comprehensive review of the available literature, in order to highlight what has so far been published on SCR,covering all aspects including biomechanica...

متن کامل

A Systematic Review of Behavioral and Emotional Problems in Children with Hearing Impairment

Background: Auditory deprivation during the early years of life can hinder the development of language, communication, social-emotional, and cognitive skills. Reviewing studies on behavioral and emotional problems in children with hearing impairment, the present article attempted to answer three questions: Are behavioral and emotional problems more prevalent in children with hearing impairment ...

متن کامل

Can a Healthcare “Lean Sweep” Deliver on What Matters to Patients?; Comment on “Improving Wait Times to Care for Individuals with Multimorbidities and Complex Conditions Using Value Stream Mapping”

Disconnects and defects in care – such as duplication, poor integration between services or avoidable adverse events – are costly to the health system and potentially harmful to patients and families. For patients living with multiple chronic conditions, such disconnects can be particularly detrimental. Lean is an approach to optimizing value by reducing waste (eg, duplication and defects) and ...

متن کامل

Characterizing the Validity and Real-World Utility of Health Technology Assessments in Healthcare: Future Directions; Comment on “Problems and Promises of Health Technologies: The Role of Early Health Economic Modelling”

With their article, Grutters et al raise an important question: What do successful health technology assessments (HTAs) look like, and what is their real-world utility in decision-making? While many HTAs are published in peer-reviewed journals, many are considered proprietary and their attributes remain confidential, limiting researchers’ ability to answer these questio...

متن کامل

Content Analysis of Infographics with the Theme of Reading Based on the Lasswell's Communication Model

Purpose: Infographics are considered a powerful communication and information medium in human-information interaction as well as in successful transmission of messages. The aim of the present study is to arrive at a model or framework based on the content of infographics published about reading, to discover and identify their content and also to introduce the capabilities and attractiveness of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001